Framewise phoneme classification with bidirectional LSTM and other neural network architectures

نویسندگان

  • Alex Graves
  • Jürgen Schmidhuber
چکیده

In this paper, we present bidirectional Long Short Term Memory (LSTM) networks, and a modified, full gradient version of the LSTM learning algorithm. We evaluate Bidirectional LSTM (BLSTM) and several other network architectures on the benchmark task of framewise phoneme classification, using the TIMIT database. Our main findings are that bidirectional networks outperform unidirectional ones, and Long Short Term Memory (LSTM) is much faster and also more accurate than both standard Recurrent Neural Nets (RNNs) and time-windowed Multilayer Perceptrons (MLPs). Our results support the view that contextual information is crucial to speech processing, and suggest that BLSTM is an effective architecture with which to exploit it.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Bidirectional LSTM Networks for Improved Phoneme Classification and Recognition

In this paper, we carry out two experiments on the TIMIT speech corpus with bidirectional and unidirectional Long Short Term Memory (LSTM) networks. In the first experiment (framewise phoneme classification) we find that bidirectional LSTM outperforms both unidirectional LSTM and conventional Recurrent Neural Networks (RNNs). In the second (phoneme recognition) we find that a hybrid BLSTM-HMM s...

متن کامل

Recognition of spontaneous conversational speech using long short-term memory phoneme predictions

We present a novel continuous speech recognition framework designed to unite the principles of triphone and Long ShortTerm Memory (LSTM) modeling. The LSTM principle allows a recurrent neural network to store and to retrieve information over long time periods, which was shown to be well-suited for the modeling of co-articulation effects in human speech. Our system uses a bidirectional LSTM netw...

متن کامل

Sound Signal Processing with Seq2Tree Network

Long Short-Term Memory (LSTM) and its variants have been the standard solution to sequential data processing tasks because of their ability to preserve previous information weighted on distance. This feature provides the LSTM family with additional information in predictions, compared to regular Recurrent Neural Networks (RNNs) and Bag-of-Words (BOW) models. In other words, LSTM networks assume...

متن کامل

CS 224D Final Project: Neural Network Ensembles for Sentiment Classification

We investigate the effect of ensembling on two simple models: LSTM and bidirectional LSTM. These models are used for fine-grained sentiment classification on the Stanford Sentiment Treebank dataset. We observe that ensembling improves the classification accuracy by about 3% over single models. Moreover, the more complex model, bidirectional LSTM, benefits more from ensembling.

متن کامل

Forward-Backward Convolutional LSTM for Acoustic Modeling

An automatic speech recognition (ASR) performance has greatly improved with the introduction of convolutional neural network (CNN) or long-short term memory (LSTM) for acoustic modeling. Recently, a convolutional LSTM (CLSTM) has been proposed to directly use convolution operation within the LSTM blocks and combine the advantages of both CNN and LSTM structures into a single architecture. Thi...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • Neural networks : the official journal of the International Neural Network Society

دوره 18 5-6  شماره 

صفحات  -

تاریخ انتشار 2005